50 research outputs found
VPE: Variational Policy Embedding for Transfer Reinforcement Learning
Reinforcement Learning methods are capable of solving complex problems, but
resulting policies might perform poorly in environments that are even slightly
different. In robotics especially, training and deployment conditions often
vary and data collection is expensive, making retraining undesirable.
Simulation training allows for feasible training times, but on the other hand
suffers from a reality-gap when applied in real-world settings. This raises the
need of efficient adaptation of policies acting in new environments. We
consider this as a problem of transferring knowledge within a family of similar
Markov decision processes.
For this purpose we assume that Q-functions are generated by some
low-dimensional latent variable. Given such a Q-function, we can find a master
policy that can adapt given different values of this latent variable. Our
method learns both the generative mapping and an approximate posterior of the
latent variables, enabling identification of policies for new tasks by
searching only in the latent space, rather than the space of all policies. The
low-dimensional space, and master policy found by our method enables policies
to quickly adapt to new environments. We demonstrate the method on both a
pendulum swing-up task in simulation, and for simulation-to-real transfer on a
pushing task
Global Search with Bernoulli Alternation Kernel for Task-oriented Grasping Informed by Simulation
We develop an approach that benefits from large simulated datasets and takes
full advantage of the limited online data that is most relevant. We propose a
variant of Bayesian optimization that alternates between using informed and
uninformed kernels. With this Bernoulli Alternation Kernel we ensure that
discrepancies between simulation and reality do not hinder adapting robot
control policies online. The proposed approach is applied to a challenging
real-world problem of task-oriented grasping with novel objects. Our further
contribution is a neural network architecture and training pipeline that use
experience from grasping objects in simulation to learn grasp stability scores.
We learn task scores from a labeled dataset with a convolutional network, which
is used to construct an informed kernel for our variant of Bayesian
optimization. Experiments on an ABB Yumi robot with real sensor data
demonstrate success of our approach, despite the challenge of fulfilling task
requirements and high uncertainty over physical properties of objects.Comment: To appear in 2nd Conference on Robot Learning (CoRL) 201
Towards Task-Prioritized Policy Composition
Combining learned policies in a prioritized, ordered manner is desirable
because it allows for modular design and facilitates data reuse through
knowledge transfer. In control theory, prioritized composition is realized by
null-space control, where low-priority control actions are projected into the
null-space of high-priority control actions. Such a method is currently
unavailable for Reinforcement Learning. We propose a novel, task-prioritized
composition framework for Reinforcement Learning, which involves a novel
concept: The indifferent-space of Reinforcement Learning policies. Our
framework has the potential to facilitate knowledge transfer and modular design
while greatly increasing data efficiency and data reuse for Reinforcement
Learning agents. Further, our approach can ensure high-priority constraint
satisfaction, which makes it promising for learning in safety-critical domains
like robotics. Unlike null-space control, our approach allows learning globally
optimal policies for the compound task by online learning in the
indifference-space of higher-level policies after initial compound policy
construction
A Stack-of-Tasks Approach Combined with Behavior Trees: a New Framework for Robot Control
Stack-of-Tasks (SoT) control allows a robot to simultaneously fulfill a
number of prioritized goals formulated in terms of (in)equality constraints in
error space. Since this approach solves a sequence of Quadratic Programs (QP)
at each time-step, without taking into account any temporal state evolution, it
is suitable for dealing with local disturbances. However, its limitation lies
in the handling of situations that require non-quadratic objectives to achieve
a specific goal, as well as situations where countering the control disturbance
would require a locally suboptimal action. Recent works address this
shortcoming by exploiting Finite State Machines (FSMs) to compose the tasks in
such a way that the robot does not get stuck in local minima. Nevertheless, the
intrinsic trade-off between reactivity and modularity that characterizes FSMs
makes them impractical for defining reactive behaviors in dynamic environments.
In this letter, we combine the SoT control strategy with Behavior Trees (BTs),
a task switching structure that addresses some of the limitations of the FSMs
in terms of reactivity, modularity and re-usability. Experimental results on a
Franka Emika Panda 7-DOF manipulator show the robustness of our framework, that
allows the robot to benefit from the reactivity of both SoT and BTs
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
Rearranging objects on a tabletop surface by means of nonprehensile
manipulation is a task which requires skillful interaction with the physical
world. Usually, this is achieved by precisely modeling physical properties of
the objects, robot, and the environment for explicit planning. In contrast, as
explicitly modeling the physical environment is not always feasible and
involves various uncertainties, we learn a nonprehensile rearrangement strategy
with deep reinforcement learning based on only visual feedback. For this, we
model the task with rewards and train a deep Q-network. Our potential
field-based heuristic exploration strategy reduces the amount of collisions
which lead to suboptimal outcomes and we actively balance the training set to
avoid bias towards poor examples. Our training process leads to quicker
learning and better performance on the task as compared to uniform exploration
and standard experience replay. We demonstrate empirical evidence from
simulation that our method leads to a success rate of 85%, show that our system
can cope with sudden changes of the environment, and compare our performance
with human level performance.Comment: 2018 International Conference on Robotics and Automatio
Probabilistic consolidation of grasp experience
We present a probabilistic model for joint representation of several sensory modalities and action parameters in a robotic grasping scenario. Our non-linear probabilistic latent variable model encodes relationships between grasp-related parameters, learns the importance of features, and expresses confidence in estimates. The model learns associations between stable and unstable grasps that it experiences during an exploration phase. We demonstrate the applicability of the model for estimating grasp stability, correcting grasps, identifying objects based on tactile imprints and predicting tactile imprints from object-relative gripper poses. We performed experiments on a real platform with both known and novel objects, i.e., objects the robot trained with, and previously unseen objects. Grasp correction had a 75% success rate on known objects, and 73% on new objects. We compared our model to a traditional regression model that succeeded in correcting grasps in only 38% of cases